Unraveling the Mechanics of a Repeat-Protein Nanospring: From Folding of Individual Repeats to Fluctuations of the Superhelix

Tandem-repeat proteins comprise small secondary structure motifs that stack to form one-dimensional arrays with distinctive mechanical properties that are proposed to direct their cellular functions. Here, we use single-molecule optical tweezers to study the folding of consensus-designed tetratricopeptide repeats (CTPRs), superhelical arrays of short helix-turn-helix motifs. We find that CTPRs display a spring-like mechanical response in which individual repeats undergo rapid equilibrium fluctuations between partially folded and unfolded conformations. We rationalize the force response using Ising models and dissect the folding pathway of CTPRs under mechanical load, revealing how the repeat arrays form from the center toward both termini simultaneously. Most strikingly, we also directly observe the protein’s superhelical tertiary structure in the force signal. Using protein engineering, crystallography, and single-molecule experiments, we show that the superhelical geometry can be altered by carefully placed amino acid substitutions, and we examine how these sequence changes affect intrinsic repeat stability and inter-repeat coupling. Our findings provide the means to dissect and modulate repeat-protein stability and dynamics, which will be essential for researchers to understand the function of natural repeat proteins and to exploit artificial repeats proteins in nanotechnology and biomedical applications.

C T P R a 5 c C T P R a 5 c y C T P R a 5 y C T P R r v 5 c C T P R r v 5 c y C T P R r v 5 y FIG. S1. Circular dichroism (CD) data of all 5-repeat constructs used in this study, reported as mean residue ellipticity (θ M R ). (A) CD spectra are shown as the mean and estimated error with line and shaded area, respectively. Although the signal at 222 nm remains largely unchanged, the mutations in the rv-type arrays appear to decrease the signal at 208 nm relative to that of the CTPRa arrays. This may either reflect the changes in helix coiling within the tertiary structure, or it is simply due to the loss of aromatics which are known to contribute to the CD signal at these wavelengths. (B) The changes in mean residue ellipticity at 222 nm, indicative of α-helicity, are small if not negligible due to the uncertainty in the protein concentration measurements between different samples (approximately 10%). FIG. S3. Equilibrium denaturation data of CTPR arrays using guanidine hydrochloride. (A) Attachment variants of CTPRrv5, CTPRrv10 and CTPRa5 were tested to examine the effect of the added ybbR-tag or cysteine residues at the N-and C-termini. While cysteine modifications did not alter the unfolding profile, the ybbR-tag slightly altered both the transition mid-point and the slope of the transition. We intentionally did not display any fits, since (i) TPRs with more than three repeats clearly deviate from two-state behavior and (ii) the number of variants was not sufficient to build ensemble heteropolymer Ising models that treated the ybbR-tag as a separate helix with different intrinsic stability and interaction energy at the N-and C-terminal interfaces of the CTPR array. (B) Ensemble Ising models require a global fitting procedure to denaturation data of a series of rv-type arrays with increasing number of repeats. Here, the fits to a homopolymer repeat model with the resulting values for with ∆G unit and ∆G nn are displayed. A heteropolymer helix model that treated the A-and B-helices different was not fitted as it would result in over-parametrization of the data (6 free parameters versus only 3 used for the homopolymer repeat model). Experiments were performed in technical triplicates in 96-well plate format, and all data are represented as averages with corresponding standard errors.   Unfolding profiles for all measured CTPRrv (A) and CTPRa (B) constructs. color maps represent the probability for each helix to be folded as a function of trap distance (please note that indexing proceeds from the C-terminus to the N-terminus in this case). (C) Using a zoom of the CTPRa9 data to exemplify how unfolding starts at the N-and C-termini: in all cases, unfolding starts with the C-terminal helix, and proceeds with the unfolding of (more or less) paired helices from both ends. . Contour length histograms of the final "dip" for as measured (roughly) from the end of the plateau to the unfolded contour. Shown are data extracted from FDCs collected at 10 and 100 nm/s of CTPRrv (blue) and CTPR (green). The mean and standard errors for each repeat type are shown. As a reference, the expected contour length increase corresponding to on average 6 helices unfolding is approximately 34 nm, while that of 7 helices unfolding is 38 nm (differences between the two repeat types are less than 1 nm). under conditions similar to AFM in which linker molecules are much shorter and the protein is tethered between a surface and a much stiffer cantilever. Here we used the structure of the consensus ankyrin NI 3 C modeled using the I-Tasser webserver (using all default values [1]), and previously reported values for the energetic parameters of ∆G unit = 5.56 k B T and ∆G nn = −24 k B T [2]. Please note, that given this particular structure our results indicate unfolding from the N-terminus to the C-terminus, which is contrary to previous findings.   Intrinsic repeat energy ∆G unit and repeat next-neighbor interaction energy ∆G nn (see eq. (S15)). ∆G tot = N ∆G unit + (N − 1)∆G nn is the total energy for a n N -mer.

III. MATERIALS
All reagents were purchased from Sigma Aldrich, New England Biolabs (NEB), ThermoFisher, Merck or Asco Chemicals unless otherwise stated. 2x yeast tryptone (2xYT) and Lysogeny Broth (LB) Miller were purchased from Formedium. Unmodified DNA oligonucleotides were purchased from Integrated DNA Technologies (IDT) or Sigma Aldrich. Synthetic genes were purchased from IDT. FastDigest restriction enzymes (ThermoFischer), Phusion High-Fidelity DNA polymerase (NEB), and QuickStick Ligase (Bioline, discontinued) or the Anza T4 Ligase Master Mix (Invitrogen) were used for all cloning processes. E. coli strains for molecular biology were purchased from Bioline (α-select Competent Cells, Gold/Bronze Efficiency, discontinued) or NEB (NEB 5-alpha Competent E. coli, High efficiency). E. coli cells for expression were generated in house from C41 cells obtained from the Kommander Lab (MRC-LMB, Cambridge). All constructs were expressed in vectors based on a pRSET backbone (Ampicillin resistance).

IV. PROTEIN SEQUENCES
The majority of CTPRs used for this study are based on the consensus sequence containing (a) the terminal RS residues arising from the BglII restriction site that is required for constructing longer repeat arrays [3,4], and (b) the QK mutation for charge balancing of the final repeat protein [5]. The four-repeat construct used for crystallography was purchased as a synthetic gene, and contained the consensus asparagine residues at the repeat termini as well as a solvating helix.
In the following sequences the pre/suffixes c and y identify cysteine and ybbR-tag attachment points for handles.

V. EXPERIMENTAL METHODS
A. Molecular biology

Mutagenesis
For Round-the-Horn site-directed mutagenesis (RTH-SDM, [6,7]), 100 µM primers containing the required mutation/insertion in the overhang were phosphorylated using polynucleotide kinase (ThermoFischer) according to the manufacturer's protocol. Phosphorylated primers were stored at −20 • C until required. The mutation was inserted by PCR, and products were DpnI-digested and gel-purified. About 50 to 100 µg of DNA material was added to 1 µL Anza T4 Ligase Master Mix in a total volume of 4 µL, incubated for 10 to 20 min at room temperature and transformed into E. coli. Plasmids were isolated from individual colonies and tested for the presence of the correct mutation/insertion by Sanger sequencing (Eurofins).

General repeat array construction
DNA constructs of CTPR proteins in a pRSET backbone were built sequentially from from one, two and four repeat modules using BamHI/BglII cloning as previously described [8]. CTPR repeats are preceded by a BamHI restriction site and followed by a BglII restriction site, double stop codon and HindIII restriction site (Fig. S12). A vector containing M repeats was digested using BglII, HindIII and FastAP Thermosensitive Alkaline Phosphatase (ThermoFisher) according to the manufacturers specifications, and purified using the QIAquick gel extraction protocol. Inserts of up to two repeats were produced by PCR amplification using T7-forward and -terminator sequencing primers. The PCR product was purified according to the QIAquick PCR purification protocol, and digested using BamHI and HindIII followed by heat-inactivation of the enzymes according to the manufacturers specifications. Inserts containing more than two repeats were obtained by restriction digest using BamHI and HindIII and gel extraction. Since BamHI and BglII produce the same 5'-overhangs, the N -repeat construct was then ligated directly into the vector using QuickStick (according to the manufacturer's protocol) or Anza T4 ligase (reduced reaction volume as described above), transformed into high efficiency E. coli cells, and plasmid purified according to QIAGEN protocols. The whole procedure was repeated until the desired number of repeats was obtained. Using synthetic genes of single repeats, all constructs without tags for DNA attachment were generated this way, and were subsequently used to produce the tagged variants. The construct used for crystallization was obtained as a synthetic gene (Integraed DNA Technologies) and was sub-cloned using the BamHI and HindIII restriction sites. For short arrays (e.g. up to 8 repeats) DNA sequencing could verify the exact number of repeats. Longer arrays were sequenced from both termini to verify the exact cloning boundaries and digested using BamHI and HindIII to determine the number of repeats.

Construction of yCTPRrv3y and yCTPRrv5y
Using RTH-SDM, the 11-amino acid ybbR-tags (DSLEFIASKLA) was inserted sequentially between (a) the BamHI restriction site and a TPR, and (b) the BglII site and the stop codons in a construct containing only one repeat (see Fig. S12A, Tab. S5). After digestion with BglII, two and four repeats obtained from BamHI-BglII digests were added at once. The correct orientation of the inserts was identified by restriction digest and Sanger sequencing.

Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y
First, ybbR-tags were introduced by RTH mutagenesis directly adjacent to the repeat sequence either N-terminally or C-terminally of a single repeat, giving rise to yCTPRrv1 and CTPRrv1y, respectively. Second, the required number of repeats were added to yCTPRrv1 two or four repeats at a time, resulting in yCTPRrv9, yCTPRrv19 and yCTPRrv25. Last, the C-terminally tagged repeat was added to produce constructs with 10, 20 and 26 that contained both N-and C-terminal ybbR-tags.

Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c
To facilitate ybbR-tagged construct generation, a pRSET vector was modified using RTH-SDM to contain an N-terminal ybbR-tag between TEV cleavage and BamHI restriction sites, and a Cterminal ybbR-tag between the HindIII restriction site and a stop codon (Fig S12B), Tab. S5). The restriction sites give rise to additional amino acids between the individual ybbR-tags and the protein: GS at the N-terminus and KL at the C-terminus. CTPRa5 and CTPRa10 were assembled in this vector by BamHI/BglII cloning. However, the last two repeats inserted were obtained by a PCR omitting the stop codons (Tab. S5) such that the C-terminal ybbR-tag was in frame. Recombination of CTPRa10 by E. coli resulted in a 9-repeat instead of a 10-repeat construct. Since the exact repeat number was irrelevant to our study, we proceeded with this construct. Due to recombination it was not possible to obtain any CTPRa constructs with ≥ 10 repeats.
Proteins containing terminal cysteine residues were created in a similar manner using the same vector but with each ybbR-tag exchanged to a single cysteine (Tab. S5). The CTPRa5 was transferred directly from the corresponding ybbR construct, while the CTPRrv5 had to be re-assembled from a 4-repeat construct fused to a repeat obtained by PCR and without stop codon (Tab. S5).

B. Protein preparation
N-terminally H 6 -tagged CTPR proteins were transformed in C41 E. coli and plated on LB Agar containing 100 µg/mL ampicillin. All colonies were used to inoculate 0.5 L of 2xYT media and grown at 37 • C until an optical density between OD 600 = 0.6 and OD 600 = 0.8 was reached, and protein expression was induced with 0.5 mM IPTG over 3-5 hours at 37 • C. After lysis the cell suspension was heated to 70 to 80 • C in a water bath to denature the majority of soluble cellular contaminants. The soluble protein was separated from denatured and insoluble protein fractions by centrifugation for 30 min at 35 000 ×g, filtered through a 0.22 µm PES membrane and applied to a 5 mL HisTrap Excel column connected to anÄkta Pure chromatography system and equilibrated in wash buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 20 mM imidazole, SIGMAFAST Protease Inhibitor Cocktail (Sigma), DnaseI (Sigma), Lysozyme (Sigma)). The column was washed with 20 column volumes of wash buffer before proteins were eluted using a high-imidazole buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 300 mM imidazole). All fractions containing protein were pooled, and if necessary, concentrated using a Vivaspin centrifugal concentrator (Sartorius) with the appropriate molecular weight cutoff. The protein was then further purified by size exclusion chromatography using a HiLoad 26/600 Superdex 75 pg or HiLoad 16/600 Superdex 75 pg (GE Healthcare) equilibrated in either Tris or phosphate buffer (50 mM Tris-HCl pH 7.5 or 50 mM sodium phosphate pH 6.8, 150 mM NaCl). Constructs with 10 repeats or more exhibited significant recombination resulting in proteins that had a decreasing number of repeats. Hence, only the first few fractions of the elution peak were pooled for concentration, while >60 % of the fractions had to be discarded. The CTPRrv4 construct used for crystallography was purified essentially as above but in 50 mM sodium phosphate pH 6.8, 150 mM NaCl based buffers. After elution from the resin with buffer containing imidazole, the protein was dialysed against 50 mM sodium phosphate pH 6.8, 150 mM NaCl for 18 hours, in the presence of thrombin (MP biomedical) to remove the H 6 -tag from the construct. The protein was further purified using a HiLoad 26/600 Superdex 75 pg column (GE Healthcare) equilibrated in 10 (10 mM HEPES pH 7.5, 150 mM NaCl, and concentrated to 20 mg/mL.
The intact mass of all constructs was confirmed by mass spectrometry.

C. Equilibrium denaturation
Samples of a total volume of 150 µL were prepared in a 96-well format (Greiner, mediumbinding), in 50 mM sodium phosphate pH 6.8, 150 mM NaCl with guanidinium hydrochloride (GdHCl) gradients of 0 to 4.5 M (CTPRrv2 and yCTPRrv3y) or 0 to 7 M (all other proteins) [9]. The exact denaturant concentration was calculated using the refractive indices of the native and denaturing buffers. A semi-automatic Hamilton Syringe unit was used to dispense the denaturant gradient. The final protein concentration was adjusted for each construct, depending on repeat type (presence/absence of one tryptophan per repeat) and array length, and ranged from <1 µM (large CTPRrv and all CTPRa constructs) to >11 µM (CTPRrv2). Samples were incubated on an orbital shaker at 25 • C for 2 h. Tryptophan residues were excited at 295 ± 10 nm and fluorescence was monitored at 360 ± 10 nm using a CLARIOStar microplate reader (BMG Labtech). Due to the deletion of tryptophan residues from the CTPRrv variant, tyrosine residues were excited at 280 ± 10 nm and their fluorescence measured at 330 ± 10 nm. The data from 9 reads were averaged and normalized. The resulting fluorescence curve, F , was converted to the fraction of folded, θ, or unfolded protein, 1 − θ, using where α N + β N D and α U + β U D describe the base lines at low (native) and high (unfolded) denaturant concentrations. Parameters for the baselines were extracted using a two-state unfolding equation to the whole data set or two separate linear fits to the baselines only.
To extract the intrinsic and interfacial energies (∆G unit and ∆G nn ) a homopolymer repeat Ising model was globally fit to denaturation data of un-tagged constructs with N = 2, 4, 5, 8 and 10 repeats using the PyFolding suite [10], the code of which is based on the formalism developed by Barrick and co-workers [11]. We did not fit a heteropolymer helix model as this would lead to overparametrization (6 free parameters vs. 5 data sets).

D. Crystallography
CTPRrv4 at 20 mg/mL was crystallized in JCSG-plus screen, well B10 (0.2 M MgCl 2 , 0.1 M sodium cacodylate, pH 6.5 and 50% v/v PEG 200, Molecular Dimensions) in sitting drop plates (SwissSci, Molecular Dimensions) with 600 nL droplets in 1:1 and 1:2 ratios of protein to well solution. Crystals were looped and flash frozen without further cryoprotectants. Crystals diffracted to 3.0Å resolution on beamline I04 at Diamond Light Source (Oxford, UK). The data were processed using autoPROC [12] with the determination of diffraction limits set by a local I/σI ≥ 1.50. The phase was solved by molecular replacement using a CTPRa4 structure (PDB accession code: 2hyz) with two molecules in the asymmetric unit. Refinements were performed using BUSTER version 2.10.3, [13,14] and iterative model building in Coot [15]. We conservatively modeled phosphate molecules in the concave face of the TPR superhelix, since this buffer was present during all purification steps prior to size exclusion chromatography. Further details on collection and refinement statistics can be found in Table S1. Models of proteins containing more than 4 repeats were created by symmetry transformation in PyMOL, and missing residues and peptide bonds, e.g. between individual 4-mers, were added using MODELLER [16].

E. Calculation of plane angles
Changes in geometry between different repeat protein structures can be measured on two levels: (a) by comparing the whole repeat array (e.g. the superhelical arrangement in the case of TPRs), or (b) by comparing the angular differences between repeat planes. Dimensions of the TPR superhelix were estimated using the "Structure Measurments" tool of UCSF Chimera [17] and 20-repeat models of both repeat types. Calculations for obtaining angles between repeat planes were adapted from Forwood et al. [18]. In brief, a principal component analysis (PCA) is performed on the C α -atom coordinates of each repeat, omitting the inter-repeat loops, to calculate the principal components (PCs, Fig. S13A) that are orientated along the length (PC1, purple), width (PC2, blue) and depth (PC3, green) of the repeat. As previously reported, curvature is defined as the angle between the respective PC2s of repeats i and i + 1 projected onto the plane of repeat i + 1, twist is the angle between PC1s projected onto the plane formed by PC1 i+1 and PC3 i+1 , and lateral bending is the angle of PC3s projected onto the plane formed by PC1 i+1 and PC3 i+1 (Fig. S13B). Next, some conventions were introduced to ensure the correct direction (positive or negative) of the angle: (i) PC1 always has the same orientation as the superhelical axis, which is defined by the right-hand-rule from the N-to C-terminal direction of the polypeptide chain [19], (ii) PC3 points into the same direction as a vector from the centroid of repeat i to the centroid of repeat i + 1, and (iii) PC2 has the same direction as cross-product of PC3 with PC1. All calculations were performed using custom-written Python scripts with NumPy and Matplotlib extensions [20][21][22][23].

F. Circular dichroism spectroscopy
Proteins used for circular dichroism spectroscopy (CD) were buffer exchanged into 10 mM sodium phosphate pH 6.8, 50 mM NaCl, 1 mM DTE using PD10 minitrap columns (Cytiva), and diluted to approximately 2 µM. CD measurements were performed on a Chirascan CD spectrometer (Applied Photophysics) using 1 mm path-length cuvettes (Precision Cells, 110-QS, Hellma Analytics). CD spectra were recorded between 200 and 280 nm at a bandwidth of 1 mm with a rate of 0.5 s/nm. The data of five scans were averaged and converted to mean residue ellipticity to account for differences in the measured concentrations and in construct length (see Section IV). Uncertainties were estimated based on the standard error of the mean of the CD readings and a 10% error to approximate uncertainties in concentration.

Sample preparation
Protein-DNA chimeras based on Sfp-mediated conjugation were essentially produced as described previously [24,25]. Reaction volumes of 50 to 100 µL containing 50 mM HEPES pH 7.5, 10 mM MgCl 2 , 10 µM ybbR-tagged protein, 20 µM CoA-oligo (Biomers) and 10 µM Sfp-synthase (made in-house, the plasmid was a kind gift from the Gaub Lab at the LMU, Munich) were incubated over-night at room temperature. If necessary, yields pf the desired product were increased by performing the reaction with 40 µM CoA-oligo and 20 µM Sfp-synthase.
Protein-DNA chimeras based on cysteine-maleimide reactions were produced as described previously [26]. In brief, proteins were reduced with a 10-fold excess of TCEP (Sigma Aldrich) for at least 30 min, desalted into phosphate-buffered saline (PBS) using a HiTrap Desalting 5ml (GE Healthcare), and reacted to a 10-fold excess of DBCO-maleimide (Sigma Aldrich) for at least 2 h. After renewed desalting, 10 µM protein was then reacted with 20 µM azide oligo (Integrated DNA Technologies) in 100 µL volumes over-night at 37 • C in an orbital shaker.
Samples were purified using a Superdex 200 10/300 GL (GE Healthcare) or YMC Pack Diol-300 (Yamamura Chemical Research) equilibrated in 50 mM Tris-HCl pH 7.5, 150 mM NaCl. Fractions containing protein conjugated to two oligos were identified by SDS-PAGE, and 4 to 10 µL of those fractions were incubated with 100 to 200 ng biotin-or digoxigenin-functionalized DNA handles at room temperature for at least 30 min. Less than 1 µL of that mixture was added to anti-digoxigenin beads in 10 µL measuring buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl) and incubated for less than 5 min. Then, 0.5 to 0.7 µL of this mixture were added to 50 µL containing streptavidin beads, an oxygen scavenger system consisting of 0.65% (w/v) glucose (Sigma), 13 U/mL glucose oxidase (Sigma), and 8500 U/mL catalase (Calbiochem). Anti-digoxigenin and streptavidin beads were produced in-house using carboxyl-functionalized 1 µm beads (Bangs Laboratories) [27]. The final mixture was introduced into a home-built chamber that had been blocked with 10 mg/mL BSA for at least 5 min and washed with measuring buffer twice.

Data acquisition
All experiments were conducted on a custom-built, dual-beam set up with back-focal plane detection, with both traps having a stiffness of 0.25 to 0.35 pN/nm. An acousto-optical deflector was used to move one bead away from (or towards) the other at speeds ranging between 10 nm/s to 5 µm/s. Bead positions were tracked using a photo-diode detector. Signals were filtered at 50 kHz using an 8-pole Bessel filter, acquired at 100 kHz and downsampled to 20 kHz before storage.
Averaged force-distance curves were obtained from constant-velocity pulling cycles at ≤100 nm/s, where there was no detectable hysteresis by binning, by averaging several stretch FDCs at typically 100 different trap distances.

A. Fitting of raw FECs
Force-extension curves (FECs) were fit with to model the DNA force response [28] and to model the unfolded polypeptide [29], where ξ is the extension, k B is the Boltzmann constant, T the temperature, p D the persistence length of DNA, L D the contour-length of the DNA and K its elastic stretch modulus, and p p and L c are the persistence and contour length of the protein, respectively. Theoretical and measured protein contour lengths are listed in Tab. S3. On average, we found that p D = 21.6 ± 0.6 nm, K = 730 ± 70 pN and p p = 0.70 ± 0.01 nm (mean ± SEM). L D correlated with the number of repeats (see Fig. S11).

B. Extracting average unfolding and refolding forces
Due to the nature of their unfolding transition, it was not possible to extract the unfolding forces, which traditionally are the force at which a protein or a subdomain unfolds completely, i.e. the force peak. The force data were processed using Igor Pro (Wavemetrics) and analysed further in Python. The data of each force curve were binned into a histogram, giving rise to clear peaks corresponding to the baseline and the unfolding plateau ( Figure S14A). The positions of these peaks was extracted from the histogram using a sum of two Gaussian functions and a linear dependence of the background noise on force (force clamping): where P (F ) is the probability density of force values, m and c are the slope and intercept of the noise level, and a the scaling factor, µ the mean and σ the standard deviation of the gaussian.
FIG. S14. Calculating the forces and energies of TPR unfolding transitions. (a) The mean unfolding force is extracted by fitting a Gaussian function (red) to a histogram of forces (right) which was derived from the raw data (left, plotted as force against its index array). (b) The non-equilibrium energies of unfolding are simply the area (shaded light blue) between the unfolding curve and the contour of the fully extended construct.

C. Estimating the work done by the trap/protein from constant velocity data
Force-extension curves taken at 10 nm/s and 100 nm/s were fitted with WLC models for both the DNA and fully extended protein. The non-equilibrium energies, or the work done by or on the system, W , were then extracted from force-distance curves (FDCs) [30]. The work done on the protein, or the stretch work, is simply the difference between the stretch trace, U (d) and the FDC of the fully extended protein, C(d): which corresponds to the area between those two curves ( Figure S14B). The work done by the protein, or the relax work, is the difference between the force response of the unfolded protein and the relax trace R(d): The full Hamiltonian of the entire system at a trap distance d is given by where H int (c) describes the conformation-dependent internal energy and H mech d (x, c) describes the mechanical energy stored in the system.
The energy for mechanically stretching the system consisting of linker and the Hookean spring of the optical trap is In the experimental configuration, the two mechanical parts consisting of dsDNA and unfolded polypeptide are in series (see Fig. S15). Hence, the extension of the full linker consisting of dsDNA and unfolded polypeptide is given by where ξ eWLC and ξ WLC are given by eq. (S3) and eq. (S11). The extension of the folded protein ξ folded was assumed to be independent of force, but dependent on the particular configuration c of the protein, i.e. it contained information on the protein structure (see Fig. S16G and Section VII A below). The inverse of eq. (S10) yields the force on the construct as a function of length of unfolded polypeptide and total extension F construct (ξ, c). The mechanical properties of the dsDNA linker were modeled using Eq. S3, and the mechanical properties of the polypeptide part were modeled using where L c (c) = N − N i=1 c i · L aa + L tag is the contour length of the unfolded polypeptide when the protein is in conformation c, p p is the persistence length of the unfolded polypeptide, L tag is the contour length of the attachment tag and L aa = 0.365 nm is the length of a single amino acid [31].
FIG. S15. Lengths and quantities used in the compliance model for a two-bead configuration (top) and the equivalent one-bead configuration (bottom).

A. Structure information
As highlighted in the main text, the models only accurately described the experimental data when the superhelical nature of CTPR proteins was considered. We incorporated this structural information into eq. S10 by setting ξ folded (c) to the sum of the end-to-end distances (C α to C α ) of all folded stretches of helices, as given by the crystal structure.
For example, for a configuration 0111001111, we set ξ folded = ξ 2...4 + ξ 7...10 , where ξ i...j is the crystal-structure end-to-end distance from the start of helix i to the end of helix j.

B. Interaction models
We considered four different interaction models of subunits and their coupling. For all models, the folded protein extension ξ folded (c) was obtained from the crystal structure for each possible configuration (see Fig. S16).

Homopolymer repeat model
In models based on a whole repeat (i.e. one A-and B-helix) as the smallest independent protein unit the internal energy of the protein is where ∆G unit is the energy of a folded subunit and ∆G nn describes the energy of the next-neighbor interactions between two adjacent folded subunits (Fig. S16A). This is the simplest form of a onedimensional Ising model.

Homopolymer helix model
The homopolymer helix model is equivalent to the homopolymer repeat model, but subunits consist of helices instead of repeats. Just as for the repeat model, interaction energies only affect next neighbors (Fig. S16B).

Heteropolymer helix model
This model takes into account that the two alpha helices in a repeat are different and thus may be parameterized by different energies. Only next-neighbor energies are allowed. The internal energy is given by where n A is the number of folded A-helices in conformation c, n AB is the number of folded pairs of A and B helices, n BA is the number of folded pairs of B and A helices, etc (see Fig. S16C).

Heteropolymer helix nearest & next-nearest (NNN) model
This model accounts for contacts between adjacent A-A and B-B helices found in the crystal structure and assigns corresponding energies (Fig. S16D). The internal energy of the protein is H int (c) = n AB ∆G AB + n BA ∆G BA + n AA ∆G AA + n BB ∆G BB + n A ∆G A + n B ∆G B . (S14) Here, n AB is the number of adjacent folded A and B helices and so on. Unfolded helices are considered to break contacts between next-nearest neighbors, such that a configuration ABA would contribute toward n AA , but A-A would not. We note that the both the heteropolymer helix model and the heteropolymer helix NNN model can be mapped to the repeat model when ∆G unit = ∆G A + ∆G B + ∆G AB and (S15) For all models, the total energy of a protein with N repeats is then ∆G tot = N ∆G unit + (N − 1) ∆G nn . (S16)

C. Calculation of force-distance curves
Under equilibrium conditions, the mean bead deflection x for a given trap distance d is where H d (x, c) is the full Hamiltonian of the system (eq. (S8)), which also depends on the modeldependent energies (e.g. ∆G nn , ∆G unit ), which are omitted here for ease of notation. Consequently, a force-distance curve (FDC) can be calculated using where k 1 and k 2 are the spring constants of the two traps.

D. Calculation of unfolding profile
Similarly, the probability of a subunit i to be folded at a given trap distance d is where δ i (c) = 1, if the i-th bit of word c is set 0, otherwise . (S20)

E. Minimal folding unit under load
To determine the size of the minimal folded unit under force conditions, we first numerically determined d * = d | p(c = 0) = 1 2 , i.e. the distance at which the unfolded configuration is equally populated as all other configurations, where is the relative population of conformation c. The minimal folded unit was then calculated as the mean number of folded subunits of all other configurations c = 0, weighted by their population.

F. Minimal folding unit in the absence of load
We define the minimal folding unit in the absence of load as the minimal amount of subunits that are necessary such that the total energy of the protein becomes negative.

G. Computation and simplification
FDCs were calculated by numerically evaluating eq. (S18) using custom-written CUDA software on a GeForce RTX 2080 graphics card (Nvidia). Even though massive parallelization greatly accelerated the computation time, the calculations were still too expensive for long repeat molecules, such as the 26-repeat protein in the helix models with a conformational space size of 2 52 ≈ 5×10 15 . A matrix formalism, which was previously employed to reduce model complexity in chemical unfolding [11], could not be used to describe the mechanical unfolding because of the non-linear contributions of the linker molecules (DNA and unfolded polypeptide) to the mechanical energy. Instead, we considered two simplifications that reduced the conformational space by eliminating extremely unlikely high-energy configurations.

Skip approximation
In helix models, we excluded all configurations in which an individual helix was folded without adjacent folded neighbors (e.g. 010111), or in which two adjacent helices were folded without a stabilising neighbors (e.g. 110111). These simplifications were in accordance with previous experimental findings that individual repeats are not stable in solution and resulted in a reduction of the computational complexity from O(2 N ) to < O(1.65 N ).
The simplifications allowed us to calculate FDCs for molecules of all repeat lengths. However, the computational cost for the longest molecules was still very expensive (≈60 h per iteration for one FDC with ≈ 4 × 10 10 configurations of a 26-mer in the Skip approximation) and prevented us from using these approximations in a fit function.

Zipper approximation
Therefore, we also considered a zipper approximation, in which unfolding always occurs from the ends and configurations such as 11101111 do not exist. This model was of complexity O(N 2 ) and could easily be fitted to all molecules.

Verification
In practice, we obtained the energy parameters by fitting the zipper approximation to molecules of all repeat lengths. We then verified that FDCs obtained from the Skip approximation, with the same energy parameters, closely reproduced the prediction of the zipper model (see fig. S5A).
The resulting energies for all molecules for which the computation was feasible were identical within errors when comparing the Skip approximation and the zipper approximation. (see Table 1 in the main text).

H. Error estimation and propagation
To determine the errors of the reported energies ∆G unit , ∆G nn and ∆G tot (eqns. (S15, S16)), we performed model fits to each individual molecule. The reported errors were then calculated by Gaussian error propagation based on the covariance matrix of the individual values of ∆G A , ∆G B , ∆G AB , ∆G BA , ∆G AA and ∆G BB and reported as standard error of the mean (SEM) [32].